feat: add ai-cache plugin by janiussyafiq · Pull Request #13578 · apache/apisix

janiussyafiq · 2026-06-19T09:02:54Z

Description

Adds a new ai-cache plugin that caches LLM responses and replays them for subsequent requests that resolve to the same prompt, cutting upstream token cost and latency for repetitive workloads (FAQ bots, document Q&A, translation).

This PR implements the exact (L1) cache layer:

Cache key — a SHA-256 fingerprint of the request as received: client protocol, requested model, normalized messages, and the remaining response-determining body parameters (temperature, top_p, max_tokens, tools, …). Provider-agnostic via ai-protocols, so it works for every chat protocol ai-proxy supports (OpenAI Chat, Anthropic Messages, Bedrock Converse, OpenAI Responses). The key also segments by the selected AI instance — the ai-proxy provider, or the ai-proxy-multi instance picked for the request (recomputed if a retry_on_error fallback answers) — so identical prompts that resolve to instances backed by different models or providers never share an entry.
Storage — Redis (single-node); connection fields are sourced from apisix.utils.redis-schema via the policy + if/then convention used by limit-count / limit-req / limit-conn.
Scope — per-route by default (cache_key.share_across_routes to share one cache space across routes); opt-in per-consumer / per-variable isolation (cache_key.include_consumer / include_vars).
Behavior — write-on-200 only (non-streaming); bypass_on opt-out (exact request-header match); fail_mode (skip / warn / error) when a request did not pass through ai-proxy / ai-proxy-multi; max_cache_body_size cap; X-AI-Cache-Status / X-AI-Cache-Age response headers; fails open (proxies as a normal miss) when Redis is unreachable.
Runs below ai-proxy (priority 1035) and depends on ai-proxy / ai-proxy-multi.

Semantic cache, streaming support, and observability are planned as follow-up PRs. User-facing documentation will be added in a later PR once the series is further along.

Which issue(s) this PR fixes:

Related to #13290

Checklist

I have explained the need for this PR and the problem it solves
I have explained the changes or the new features added to this PR
I have added tests corresponding to this change
I have updated the documentation to reflect this change
I have verified that this change is backward compatible

Copilot

Pull request overview

Adds a new ai-cache APISIX plugin that provides an L1 exact-match cache for non-streaming LLM requests handled by ai-proxy, using Redis as the backend and exposing cache debug headers.

Changes:

Introduces the ai-cache plugin implementation, schema, and keying logic (SHA-256 fingerprint + configurable scope).
Adds an end-to-end test suite covering MISS/HIT, bypassing, TTL expiry, scope isolation, and fail-open behavior.
Wires the plugin into the default plugin lists and build/install packaging.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`apisix/plugins/ai-cache.lua`	Core plugin logic: lookup on access, capture on body/log, Redis integration, cache headers.
`apisix/plugins/ai-cache/schema.lua`	JSON schema for plugin configuration, leveraging `apisix.utils.redis-schema` via `policy` + `if/then`.
`apisix/plugins/ai-cache/key.lua`	Cache key fingerprinting (protocol/model/messages/params) and scope computation.
`t/plugin/ai-cache.t`	New functional + unit tests for cache behavior and edge cases.
`t/admin/plugins.t`	Adds `ai-cache` to the admin plugin list expectation.
`conf/config.yaml.example`	Adds `ai-cache` to the example plugin list with priority comment.
`apisix/cli/config.lua`	Adds `ai-cache` to the CLI’s default plugin list.
`Makefile`	Installs the `apisix/plugins/ai-cache/` directory Lua modules during `make install`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…ss_on Encode the request fingerprint with rapidjson (sort_keys) plus a to_rapidjson_value pass that maps the JSON null sentinel and array_mt tables, mirroring ai-transport/http.lua. core.json.stably_encode (dkjson) raised on the cjson null sentinel, so a body carrying an explicit null (e.g. OpenAI's "stop": null) would error out of the access phase. Replace the cache_bypass var-ref opt-out with bypass_on: an array of {header, equals} rules that skip the cache when a request header exactly equals its value (per rfcs#78). Exact header == value only; any matching rule triggers BYPASS. Tests: add a null-body fingerprint regression, migrate the bypass tests to bypass_on, and cover multiple rules where any match bypasses.

… update fingerprinting logic

Copilot

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

…tion

Document the ai-cache plugin: description, full attribute table (incl. all Redis policy fields), and Admin API / ADC / Ingress Controller examples covering cache MISS/HIT and bypass_on. Add the page to the en and zh plugin sidebars.

…ey configuration

…oute cache sharing scenarios

nic-6443

Thanks for the quick turnaround — all my comments are addressed: per-route scoping by default with share_across_routes opt-out, red:close() on Redis errors instead of pooling a broken connection, the dead layers knob dropped, and the canonical encoding pulled up into core.json.canonical_encode (nicely de-duped with ai-transport). LGTM.

membphis · 2026-06-24T05:06:58Z

I found two merge-blocking issues in the current ai-cache implementation:

[P1] Cache key does not include the effective model or picked AI instance

ai-cache computes the fingerprint from ctx.var.request_llm_model or body.model, but it does not include ctx.picked_ai_instance_name, provider, or the route / instance effective options.model:

apisix/plugins/ai-cache/key.lua: the fingerprint uses only protocol, requested model, normalized messages, and remaining body params.
apisix/plugins/ai-cache.lua: the lookup happens in access, before the upstream request is built.
ai-proxy-multi has already selected ctx.picked_ai_instance before lower-priority plugins run, so that selected instance is available at cache lookup time.

This can return the wrong provider/model response on an ai-proxy-multi route. A request can warm the cache through instance A, then a later identical request can be routed to instance B but still hit and replay instance A's response because both requests share the same cache key.

This should be fixed before merge by including the selected AI instance and/or effective model/provider in the cache key or scope, with a regression test covering ai-proxy-multi instances that use different models or providers.

[P2] The plugin can cache ordinary JSON traffic when it is not behind `ai-proxy`

The docs say ai-cache must be used with ai-proxy or ai-proxy-multi, but the implementation does not enforce or safely bypass that condition. ai-cache.access reads any JSON request body, computes a key, and marks the request as MISS; then log writes any 200 response to Redis. There is no ctx.picked_ai_instance guard like the existing AI moderation plugins use.

If the plugin is accidentally attached at Route / Service / Consumer level without an AI proxy, ordinary JSON upstream responses can be cached and replayed. That is a surprising behavior and can leak stale or incorrect non-AI responses.

Please add a guard before key computation, either bypassing by default or using the shared ai-protocols.binding fail_mode behavior, and add coverage for the no-ai-proxy case.

…mprove cache key generation

# Conflicts: # Makefile

membphis · 2026-06-24T09:50:09Z

I rechecked the latest update. The no-ai-proxy guard looks addressed now: ai-cache.access checks ctx.picked_ai_instance first, uses the shared ai-protocols.binding fail_mode, and the tests cover both the default bypass behavior and fail_mode=error.

One merge-blocking cache-key issue still remains:

[P1] `share_across_routes` can still reuse a response across different effective models on plain `ai-proxy` routes

The new scope includes ctx.picked_ai_instance_name, which fixes the ai-proxy-multi case because different picked instances have different names. However, for the plain ai-proxy plugin, ctx.picked_ai_instance_name is only ai-proxy-<provider> (for example, ai-proxy-openai). It does not include the route-level effective model.

The fingerprint still uses the requested model from ctx.var.request_llm_model or body.model, not the effective model selected by ai-proxy from ai_instance.options.model or request_model. This leaves a real collision when cache sharing is enabled across routes:

Route A uses ai-proxy with provider=openai, options.model=gpt-4o, and ai-cache.cache_key.share_across_routes=true.
Route B uses ai-proxy with provider=openai, options.model=gpt-4o-mini, and the same Redis/cache settings.
The client sends the same body to both routes, especially if the body omits model or carries the same requested model.
Both routes compute the same scope (instance=ai-proxy-openai) and the same fingerprint, so Route B can replay Route A's cached response even though the effective upstream model is different.

This also contradicts the new docs, which say that even with share_across_routes enabled, responses from different upstream models or providers are kept in separate cache entries.

Please include the effective model in the key/scope for the plain ai-proxy case, for example by deriving it from ctx.picked_ai_instance.options.model or body.model before lookup. It would also be good to add a regression test with two plain ai-proxy routes using the same provider, different options.model, the same Redis, and share_across_routes=true; the second route should be a MISS, not a HIT.

janiussyafiq added 2 commits June 19, 2026 15:23

feat: add ai-cache plugin to installation and configuration

8cd41f8

feat: implement ai-cache plugin with Redis support and testing

1ea1aaa

dosubot Bot added size:XXL This PR changes 1000+ lines, ignoring generated files. enhancement New feature or request plugin labels Jun 19, 2026

Merge remote-tracking branch 'upstream/master' into feat/ai-cache-exact

5c04222

nic-6443 reviewed Jun 22, 2026

View reviewed changes

Comment thread apisix/plugins/ai-cache/key.lua Outdated

shreemaan-abhishek requested a review from Copilot June 23, 2026 01:16

Copilot started reviewing on behalf of shreemaan-abhishek June 23, 2026 01:16 View session

Copilot AI reviewed Jun 23, 2026

View reviewed changes

Comment thread apisix/plugins/ai-cache/key.lua

Comment thread apisix/plugins/ai-cache.lua

Comment thread apisix/plugins/ai-cache.lua

Comment thread t/plugin/ai-cache.t

Comment thread apisix/plugins/ai-cache.lua

janiussyafiq added 2 commits June 23, 2026 09:59

feat(ai-cache): enhance body filter to handle oversized responses and…

d91e68a

… update fingerprinting logic

janiussyafiq requested a review from Copilot June 23, 2026 02:56

Copilot started reviewing on behalf of janiussyafiq June 23, 2026 02:56 View session

Copilot AI reviewed Jun 23, 2026

View reviewed changes

Comment thread apisix/plugins/ai-cache.lua

Comment thread apisix/plugins/ai-cache/schema.lua

Comment thread t/plugin/ai-cache.t Outdated

Comment thread apisix/plugins/ai-cache.lua

janiussyafiq added 2 commits June 23, 2026 11:18

feat(ai-cache): optimize body caching logic and enforce header valida…

652a89f

…tion

nic-6443 reviewed Jun 23, 2026

View reviewed changes

Comment thread apisix/plugins/ai-cache/key.lua Outdated

nic-6443 reviewed Jun 23, 2026

View reviewed changes

Comment thread apisix/plugins/ai-cache/key.lua Outdated

Comment thread apisix/plugins/ai-cache.lua Outdated

Comment thread apisix/plugins/ai-cache.lua

Comment thread apisix/plugins/ai-cache/schema.lua Outdated

janiussyafiq added 2 commits June 23, 2026 15:57

feat(ai-cache): implement canonical JSON encoding and enhance cache k…

84c5ccf

…ey configuration

feat(ai-cache): update tests for exact.ttl validation and add cross-r…

9024b70

…oute cache sharing scenarios

nic-6443 previously approved these changes Jun 23, 2026

View reviewed changes

fix(json): remove redundant require statement in json.lua

6f15de7

janiussyafiq dismissed nic-6443’s stale review via 6f15de7 June 24, 2026 01:12

shreemaan-abhishek previously approved these changes Jun 24, 2026

View reviewed changes

janiussyafiq added 2 commits June 24, 2026 15:53

feat(ai-cache): enhance error handling for unsupported requests and i…

4775bfc

…mprove cache key generation

Merge remote-tracking branch 'upstream/master' into feat/ai-cache-exact

edf5b51

# Conflicts: # Makefile

janiussyafiq dismissed shreemaan-abhishek’s stale review via edf5b51 June 24, 2026 08:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add ai-cache plugin#13578

feat: add ai-cache plugin#13578
janiussyafiq wants to merge 12 commits into
apache:masterfrom
janiussyafiq:feat/ai-cache-exact

janiussyafiq commented Jun 19, 2026 •

edited

Loading

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nic-6443 left a comment

Uh oh!

membphis commented Jun 24, 2026

Uh oh!

membphis commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

janiussyafiq commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Which issue(s) this PR fixes:

Checklist

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nic-6443 left a comment

Choose a reason for hiding this comment

Uh oh!

membphis commented Jun 24, 2026

[P1] Cache key does not include the effective model or picked AI instance

[P2] The plugin can cache ordinary JSON traffic when it is not behind ai-proxy

Uh oh!

membphis commented Jun 24, 2026

[P1] share_across_routes can still reuse a response across different effective models on plain ai-proxy routes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

janiussyafiq commented Jun 19, 2026 •

edited

Loading

[P2] The plugin can cache ordinary JSON traffic when it is not behind `ai-proxy`

[P1] `share_across_routes` can still reuse a response across different effective models on plain `ai-proxy` routes